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A Covariance Approach to Item Analysis 
Charles T. Ifyers 
Educational Testing Service 



Abstract 

This paper brings together a variety of item-analysis 
techniques into a coherent system. The system is based on classical 
test theory, the theorems that can be derived from the equation, X * T + E. 
The system extends fron techniques for analyzing parts of an item separately 
to techniques for relating items to total test score, to sub-scores, to 
external criterion scores, and to test-retest situations. The system is 
both mathematically sirple and basic and may serve to bring test theory and 
test practice closer together. 



A Covariance Approach to Item Analysis 
Charles T. Myers 
Educational Testing Service 

Introducti on 

A variety of different item analysis techniques have been developed 
to serve a variety of purposes in test construction and test theory. A number 
of these item analysis techniques provide two indices for each item, usually 
a "difficulty" index and an index showing a relationship between score on the 
item and score on the test of which the item is a part, an index of item-test 
"homogeneity." There are three different types of correlation coefficient that 
are commonly used for this purpose! the biserial, the point-biserial, and the 
phi coefficients. With this much variety available to the test developer and 
the test theorist (not to mention other possibilities such as the use of item 
characteristic curves), it does not seem necessary to advance a new alternative. 
Actually, it is the purpose of this paper to discuss some of the advantages of 
an old tut rarely used approacn and to indicate how it can be extended to some 
new techniques and purposes, This approach appears to have the advantage of 
simplicity so that it is easy to understand, to compute, and to criticize. It 
seems to have a minimum of assumptions and it is closely related to some of the 
basic concepts of test theory. Its principal distinction is that it uses 
covariances rather than correlations for its indices of homogeneity. 

With a square matrix of variances and covariances for any set of 

variables, the sum of all the variances and covariances equals the variance 
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of the sum of variables. If the variables are the items in a test, the sum 
of all the item variances and inter-item covariances equals the variance of 
the test, the sum of a row or column equals an item-test covariance, and 
the sum of the item-test covariances also equals the test variance. This 
is true whether the items are scored dichotomously zero and one or for any 
other scoring system, but with zero and one scoring there an some interesting 
simplifications. For one thing, if the mean score of an item (the proportion 
passing in this case) is known then the item variance is fixed, the variance 
is dependent on the mean) which is, of course, not true for the general case 
of score distributions. 

A correlation matrix may be thought of as a covariance matrix of 
variables with standardized scores, and a common procedure for computing a 
correlation matrix is first to find the covariance matrix and then divide 
each row and column by the appropriate standard deviation. The principal 
advantage of correlations as compared with covariances is to assist in the 
interpretation of these statistics, just as standardized test scores assist 
in score interpretation. The principal disadvantage of correlations is that 
it is more difficult to interpret the meaning of sums of correlation 
coefficients than it is to interpret the meaning of sums of covariances. 

Also, there is still controversy over the appropriate technique for standardizing 
item- test correlations- -some favor biserials and others favor point-biserials. 
Since items that are scored zero and one are standardized as to range arid 
since item variance is fixed by item difficulty, standardization may not be 
so useful for items and it may be more useful in item analysis to use 
covarianceu rather than correlations. It should be noted that an item-test 
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covariance may be divided by the standard deviation of the test to produce 
a statistic that seems to share some of the advantages of covariances and 
some of the advantages of correlations. In this paper this statistic will 
be called the item effectiveness index and will be symbolized by e^^. 

Procedures 



In classical test theory (the theory derived from tho equation 
X ■ T + E), the three principal statistical characteristics ol' a test are 
its mean, its standard deviation (or its variance), and its reliability. 
This covariance approach to item analysis is so closely related to these 
aspects of classical test theory that it may be appropriate to call it 
classical item analysis. This system of item analysis provides three 
indices for each item. These indices are each respectively related to 
the test mean, the test standard deviation, and the test reliability. 

In each case the test statistic is obtained merely by summing the item 
indices. This system also provides techniques for evaluating parts of 
the test and even parts of the items-*for splitting the test atom. Finally, 
this system includes b. technique for evaluating test and item validity when 
criterion data are available. Although in its simplest form this system 
assumes that items are scored either zero or one, it may easily be extended 
to formula*score tests. 

The first moment of a test score distribution is its mean and 
the first statistic in this classical covariance approach to item analysis 
is the item mean. The item mean is the proportion passing the item, 



symbolized p^ . The sum of the proportions passing all the items in a 
test equals the test mean# The item variance is determined by the item mean 
and equals the item mean times one minus the item mean, s^ ■ p^(l - p^) 

(Horst, 1966). 

Tlie second moment of a test score distribution is its variance and 

the second statistic in this item analysis system is the item effectiveness 

index, e^ . This index is computed by finding the item- test covariance and 

dividing it by the standard deviation of the test. The sum of item effectiveness 

indices for all the items in the test equals the test standard deviation. 

One value that is gained by using the item effectiveness index instead of the 

item-test covariance is that it facilitates comparisons between items taken 

from tests of different lengths. This index is the same as the item-analysis 

index that was called the "reliability index" in Oulliksen's Theory of Mental 

Tests (1950) and symbolized r s • It is obvious that a correlation 
' Xfe g 

coefficient multiplied by the standard deviation of one variable is equal to 
the covariance divided by the standard deviation of the other variable, 

Many item analysis systems provide only two indices for each item. 

The two indices that have been described for this system may be used for 
most of the purposes that item analysis has been used for, both for test 
production and for elementary classical test theory. However, the covariance 
approach is compatible with a number of other statistics that are logical 
extensions of the system. The first of these relates item analysis to test 
reliability through Woodbury's (1951) concept of the standard length of a 
test, The reliability discussed here is the Kuder-Richardson (1937) 
formula 20 reliability. This third item coefficient has been called the 
"length" of the item and has been symbolized by (Myers, 1961). 

■V 
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The standard length of a test, as defined by Woodbury, is the 
number of items required for the test to have a reliability of ,$0, If 
the standard length is taken as a unit, then the length of an item is the 
fraction of that unit that is represented by a single item. If Kuder- 
Richardson formula 20 reliability is understood to be the ratio of true 
variance to observed variance, it implies that the true variance of an item 
is equal to the average of the covariances of that item with all the other 
items in the test. If that average is subtracted from the item variance, the 
remainder is understood to be error variance by this definition. The sum of 
these remainders is equal to the variance of errors of measurement for the test. 
The length of an item is computed by dividing the average covariance of the 
item by the remainder or error variance. The sum of the item lengths is 
equal to the length of the test in standard length units. The reliability 
of the test is easily computed by dividing the test length (in standard 
length units) by oie plus that length. 

There is another possible use for the average inter-item covariance 
statistic. It has often been found difficult to interpret an item, analysis 
of r b-scores or part scores in a test. Typically these sub-scores are 
fairly highly positively correlated and the distinctions between them 
are subtle. Many item analysis homogeneity coefficients include an element, 
so»» times called a spurious element, produced by the perfect correlation of 
the item with itself. This element makes the interpretation of the subtle 
differences between sub-scores very difficult. When the average inter-item 
covariance is computed as it was in the previous paragraph, the item variance 
is not included in the average. Thus this statistic should clarify the 
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analysis of tests into homogeneous sub-scores. 

The covariance approach to item analysis lends itself conveniently 
not only to the analysis of tests into sub-scores, but also to the analysis 
of a single item into its parts. As anyone knows who has found a niskeyed 
item in an i*em analysis, it is possible to do distracter analyses as well as 
item analyses. The homogeneity index for a distracter is usually negative. 

In the covariance analysis, the sum of the effectiveness indices for all the 
responses to an item is always equal to zero} therefore the sum of the indices 
for all distracters equals minus one times the index for the correct answer, 
that is, if no one omits the item. If seme omit the item, that response can 
also be analyzed. Thus, the standard deviation of a test equals minus one 
times the sum of the effectiveness indices for all the responses other than 
the correct responses. Sums of separate categories of these responses may 
be of interest. For example, a test speededness index may be computed from 
the sum of the effectiveness indices for all the responses of not reaching 
an item. 

Gulliksen (i;$0) has shown how item indices may be used to study 
the validity of Items when a relevant criterion score is available. The same 
procedure night also be used with scores on a retest or parallel test 
administered after a learning interval. Using the retest as the criterion 
should offer some new insights into the difficult problem of measuring gair 
rather than the traditional static meafvrement of position. 

Although this already appears to be an extensive and comprehensive 
system of item analysis, it is quite possible that other uses and extensions 
of this system can easily be developed. 
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Discussion 



The covariance system of item analysis does not appear to have 
been widely used, yet it appears to have mary worthwhile advantages. It 
can be applied to most of the ccrunon uses of item analysis in the art of 
producing tests. It is simpler to compute than may other systems and it 
can be applied to test production by ary man who can aid and who has a pencil 
and pad of paper. It involves the most simple and direct relationships 
between item statistics and score distribution statistics. This simplicity 
also provides an element of mathematical elegance that might appeal to even 
a sophisticated test theorist. Good test production requires highly 
sophisticated subject matter competence on tho part of the test assembler. 
There has often been a difficulty in communication between such persons 
and mathematical test theorists. PerJiaps the greatest value of the 
covariance approach to item analysis is that it may bring these two u. inches 
of expertise more effectively together. 
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Appendix 

Summary of Classical Item analysis 



Where is the mean score on item i , and 

is the proportion passing item i , then 

M i = Pi ■ 



Where s^ is the variance of scores on item i , then 



8 i = p i^’ " 



Where is the mean score on test t , and 

n is the number of items in test t , then 

n 

M = E p , . 

1 i=1 1 

Where c^ is the covariance of item i and test t } then 



e it “ c it //s t * 



Where e.. is called the '’item effectiveness" index of item 
test t , and 

is the covariance of item i and item J , then 
2 n 

c it " s i + • 

2 n n 
»t * £ c n ” • 



and 



so that 



n 

8i * 2 e • . » 
t i=1 11 



-9- 



O 

ERIC 



(0 

( 2 ) 

(3) 

(4) 

i in 

(5) 

(6 and 7) 
( 8 ) 



Where r. . is the item-test biserial correlation, and 
bis 

r . , is the item-test polnt-biserial correlation, and 
p.bls 

y^ is the ordinate of the normal curve at the point that 
cuts off a proportion equal to p^ , then 



! it " Wi ■ r p.bis s i • 
Oj ” (c it - Sj)/(n - I) , 



n 



and 



(9 and 10) 

( 11 ) 

n _ (12 

(s* - E sf)/(n* - n) = E c./n . and 13) 
% i-1 1 i=1 1 

Where k^ is the ■•length" of item i in standard length units, then 

k i = ®i/(®i ~ > (14) 

and where r^ is the test reliability as defined by Kuder- Richardson 
formula 20, 



n n 

r. . - E k/(l + E k.) , 
U i=1 1 i=t 1 



(15) 



2 - 



and where u^ is defined as s^ -■ c^ , then 



2 — 2 — - 

r tt = n c/(n c + nu) . 



(16) 



Where d represents distracters in a 5-choice item and if no one omits 
the item, then 



e . . (— l)( E 6j) • 

1X> d=1 a 



Cl 7) 
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Where M, is the mean score of test t computed by the formula: 

t r , 

score = right - j- wrong, and 

s. is the standard deviation of fonnula scores, and 
t f 

p, is the proportion answering correctly, and 
i 

is the proportion answering incorrectly, then 



^t ^ P+ “ 7 £ P_ * 
f i=l i 4 i=> i 



( 18 ) 



and 



n 

s. = ^ e + 
l f i=l it 



1 



n 



-7 2 e 
4 i=J “it 
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